Diversity in Ensemble Feature Selection

نویسندگان

  • Alexey Tsymbal
  • Mykola Pechenizkiy
  • Pádraig Cunningham
چکیده

Ensembles of learnt models constitute one of the main current directions in machine learning and data mining. Ensembles allow us to achieve higher accuracy, which is often not achievable with single models. It was shown theoretically and experimentally that in order for an ensemble to be effective, it should consist of high-accuracy base classifiers that should have high diversity in their predictions. One technique, which proved to be effective for constructing an ensemble of accurate and diverse base classifiers, is to use different feature subsets, or so-called ensemble feature selection. Many ensemble feature selection strategies incorporate diversity as a component of the fitness function in the search for the best collection of feature subsets. There are known a number of ways to quantify diversity in ensembles of classifiers, and little research has been done about their appropriateness to ensemble feature selection. In this paper, we compare seven measures of diversity with regard to their possible use in ensemble feature selection. We conduct experiments on 21 data sets from the UCI machine learning repository, comparing the ensemble accuracy and other characteristics for the ensembles built with ensemble feature selection based on the considered measures of diversity. We consider five search strategies for ensemble feature selection: simple random subsampling, genetic search, hill-climbing, ensemble forward and backward sequential selection. In the experiments, we show that, in some cases, the ensemble feature selection process can be sensitive to the choice of the diversity measure, and that the question of the superiority of a particular measure depends on the context of the use of diversity and on the data being processed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...

متن کامل

سودمندی رگرسیون‌های تجمیعی و روش‌های انتخاب متغیرهای پیش‌بین بهینه در پیش‌بینی بازده سهام

مقاله حاضر به بررسی سودمندی رگرسیون‌های تجمیعی و روش‌های انتخاب متغیرهای پیش‌بین بهینه (شامل روش مبتنی بر همبستگی و ریلیف) برای پیش‌بینی بازده سهام شرکت‌های پذیرفته شده در بورس اوراق بهادار تهران می‌پردازد. به‌منظور ارزیابی عملکرد رگرسیون تجمیعی، معیارهای ارزیابی (شامل میانگین قدرمطلق درصد خطا، مجذور مربع میانگین خطا و ضریب تعیین) مربوط به پیش‌بینی این روش، با رگرسیون خطی و شبکه‌های عصبی مصنوعی...

متن کامل

Enhancing Ensemble Performance through Feature Selection and Hybridization

Ensemble has been proved a successful approach for enhancing the performance of a single classifier. But there are two key factors directly influencing the outcomes of an ensemble: accuracy of each single member and diversity between the members. There have been many approaches used in the literature to create the mentioned diversity. In this paper, we add to them a novel approach, in which cla...

متن کامل

MLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection

Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...

متن کامل

Ensemble Feature Weighting Based on Local Learning and Diversity

Recently, besides the performance, the stability (robustness, i.e., the variation in feature selection results due to small changes in the data set) of feature selection is received more attention. Ensemble feature selection where multiple feature selection outputs are combined to yield more robust results without sacrificing the performance is an effective method for stable feature selection. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003